Rules (c,e,s-Wall)

Regular Expressions
Significant part of the data transferring via the internet is in text format. E-mail messages are a text (or data transformed to the text form). Web pages (except pictures etc) are in HTML format, which is based on text format too and may be edited with a text editor.
Because x-Wall is to process the text forms, it uses one of the most famous and powerful text manipulation instrument - Regular Expressions (RegExp). It's not a programming language. A single RegExp line of letters and the special symbols is a template of the text fragment we would like to find (compare, replace) in a text or an internet content.
If a RegExp template matches to a text, than it's called "Found" or otherwise "Not Found".
Rules structure

A RegExp Rule consists of three PartRules: "Main","Sub Rules" and "Replace".
  • 1. "Main" PartRule (MainRule) is trying to find substring in whole content. If substring was not found, the whole Rule is expected to be not processed.
  • 2. Substring found by MainRule is applied to list of SubRules in series. If the SubRule is "Not Found" all the list is expected "Not Found". For an example, you can find a html tag with main rule and then find (compare, replace) it's attributes separately by SubRules.
  • 3. If MainRule is in "Found" state and SubRules List (if is not empty) is in "Found" state too, then "Replace" PartRule (ReplaceRule) is applied to the whole content. For an example, ReplaceRule may be used to change subject of the an e-mail if that e-mail contains something we could define in a Rule or SubRule.
RegRule

Each PartRule has the same structure (RegRule) and consists of three strings (in RegExp or text format): "Find","Compare" and "Replace".
  • "Find" string is RegExp of the template to be found in the content.
  • "Compare" string (if enabled) is RegExp to compare the substring (i.e. part of the internet content) found on previous step with a template.
  • If the substring is found and matches the template, then it can be replaced with a given "Replace" text (if enabled).
Composing and Testing Rules
Every PartRule and whole rule may be tested separately to insure be OK with the special dialogs. You have to be careful with the rules composition and modification. These Rules will be applied to the whole internet content (including all protocol internal structures). Improper rules may cause application program not to work properly.
Content Rule Sample
Let us try to understand how the RegExp filter works on the example of the banner filter. As everybody knows, that banner is the picture of the several common sizes. We will describe the steps to detect and remove the picture. (see "Web: Ad/Banners by size")
  • As banner is an image, we firstly should detect the image. An image is coded as <IMG> tag in a web page or in an e-mail. The first RegExp determines all the <IMG> tags from the content
  • As banner is an image of predefined size, we should test the both dimensions of the image. If the dimensions are matched with the predefined sizes, we can decide that the image is a banner. If it is so we will go to the next step. If the sizes are not matched, we will go to find a next IMG tag until the content ends.
  • If we have decided that the image is a banner, we should substitute a picture with a more attractive one (from our repository or any other). We have to find the SRC parameter of the picture and set to the URL of the new image.